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ABSTRACT 


The increasing cost and complexity of software in recent years is causing a growing 
interest in the development of measurement technology to evaluate, predict and compare 
software complexity. Metrics can be used throughout all the development cycle providing 
valuable information to the software developers in order to enhance the final products. The 
goal of this thesis is to verify empirically the fault-predictive ability of some software 
complexity metrics and specifically their usefulness during the testing phase. 

A set of eight programs, varying in length from 1,186 to 2,489 lines of Pascal code 
with 157 faults identified with specific modules, provided the data for this study. The 
results of the analysis of the programs using four metrics, cyclomatic complexity, 
bandwidth, nested complexity and number of statements, show that control-structure metrics 
can be effectively used to detect the more fault-prone modules. The nested complexity of 

| the modules seems to have some relation with the number of faults caused by wrong use 
of variables and overrestrictive input checks. These observations can be particularly useful 
during the testing phase because testers can use control-structure metrics to predict not only 
the modules that may cause more probiems but also the more frequent types of faults and 


use the metrics to guide the choice of testing techniques. 
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I. INTRODUCTION 


Software testing and maintenance has been estimated to consuine 70% of the overall 
software development effort [1]. Testing and debugging costs range from 50% to 80% of 
the cost of producing a first working version of a software package [2]. Thus, the 
development of effective error detection techniques is one of the most important issues in 
the effort to reduce the cost and to increase the quality of software. 

There have been many different approaches to software testing and error detection 
such as structural testing, functional testing or correctness proofs. Structural testing 
techniques deal with the degree to which test cases exercise or cover the structure of the 
program. Functional testing techniques are concerned with finding the input values with 
which the program does not behave according to its specifications. Correctness proofs use 
formal languages to specify the requirements and mathematical logic to verify that the 
specifications are achievable by the program. None of these approaches can guarantee to 
isolate all sources of program errors. 

For complete confidence, structural testing strategies require that all the paths in a 
program are tested, but testing all the paths is usually impossible because programs often 
contain an infinite number of paths. This has led to the development of a number of path 
selection criteria. A path coverage criteria is satisfied by certain sets of paths through a 
program, where a path is a sequence of statements. An effective criteria requires paths with 


high probability of revealing faults [3]. 








It has been hypothesized that software errors seem to come in clusters and some areas 
of the programs seem to be more error prone than others [4], thus one of the goals of 
software testing is to detect these areas. Some studies [5] indicate that there is some 
relation between the number of errors found in most computer programs and their logical 
complexity. 

Software testers should select a sufficient number of paths to achieve coverage, 
starting by the shorter and simpler functionally sensible paths, trying to minimize the 
number of decision changes from path to path. The fundamental criteria is to assure that 
every instruction has been exercised at least once and every decision (branch or case 
statement) has been taken in each possible direction at least once. Associated with each 
path the test plan must contain a specification of the inputs that will force that path and a 
specification of all the outputs and database changes expected for that path. The derivation 
of the path-forcing input values is called path sensitizing. 

The path sensitizing process is sometimes very difficult because the input values are 
not obvious. Some paths are confusing, counterintuitive and hard to understand. The 
presence of loops and the fact that the same decision may recur several times in a routine 
can reduce the number of paths through the routine to the point where seemingly sensible 
paths are not achievable. 

It has been hypothesized that one reason why errors are not identified by programmers 
is that they are in parts of the code that are difficult to reach. Our assumption is that the 
source code in those areas should be complex in terms of number of nested control 


structures. 











One of our purposes is to verify empirically if there exists some relationship between 
the software complexity that can be detected by static analysis at the source code level and 
the actual number of errors found in the modules that have more complexity. It would be 
useful to find some way to identify and differentiate the areas of the program that tend to 
be more difficult to test and debug without having to walk through all the source code. 

Another purpose of this study is to verify if the number of nested control structures 
used in the programs has some relation to the types of errors detected in the areas that 
contain them. 

One of the goals of software engineering is to reduce the costs of software 
development using a disciplined approach. A disciplined approach needs techniques to 
identify or define indices of merit that can support quantitative comparisons and evaluations 
[6]. The software complexity measures may be useful in preparing quality specifications 
and making design tradeoffs between maintenance and development costs [7]. 

The use of measurement technology has been identified as one of the functional tasks 
in the Department of Defense research program Software Technology for Adaptable, 
Reliable Systems (STARS) [8]. This technology concerns the development of evaluation 
criteria, their associated measures and metrics, and the experimental evaluation of 
techniques, methods and tools. The goal is to find ways of measure software attributes so 
that we can quantify software, and develop metrics that may be used to compare and predict 
software complexity. Some of the more important questions in the study of software 
metrics are how well the metrics really represent software complexity and development 


effort, and how well the metrics relate to software errors and reliability. 








Software complexity can be defined as a measure of how difficult a program is to 
understand, modify and test. The importance of software complexity is represented in 
Figure 1. The goal of any software project is to stay within a reasonable budget and 
maximize the understandability, modifiability and testability of the code. The nature of the 

-system under development will determine the proper weighting of the different quality 
factors to be achieved in the delivered software. Maintainability is typically of primary 
importance for business systems. Testability and reliability are critical concerns for life- 
support systems software. Efficiency takes precedence in many embedded real-time 
systems. Some quality factors, however, are contradictory and difficult to maximize. 
Optimizing code often lowers its understandability. Software complexity metrics can be 
used to monitor and modify the development effort according to their values: metrics can 
be used to predict the resources that will be required to implement and test the code, metrics 
can be used to predict the number of faults that may be found in subsequent testing or the 
difficulty involved in modifying a section of code. 

The initial budget and time scheduled for a project influence the complexity of the 
software developed and consequently the quality of the product. The use of more resources 
when the final product does not achieve the quality initially required increases the 
development cost and time. Metrics are tools that can be used to control phases of the 
software cycle, providing feedback information to the project managers and programmers 


about the complexity of the product being developed. 
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Figure |. Importance of Software Complexity 


There has been a great research effort to develop ways of measuring the complexity 
of programs. Using our intuitive notiun of software complexity, we expect that complex 
programs will cost more to build and test, and will have more latent software errors. 

Any useful measure of complexity must satisfy two basic requirements. First, it can 
be calculated for all programs to which developers apply it, and second, by adding 
something to a program (instructions, storage, processing time, etc.) the measured 
complexity can never decrease. Some complexity measures may serve as good predictors 


of particular properties of the programs. 











Our hypothesis is that software complexity due to the number of control structures and 
the nesting level has some kind of relation with the degree of difficulty that programmers 
face when they try to test their programs, specifically during the path sensitizing phase. To 
test this hypothesis we analyzed the flow of control, types of control structures and levels 
of nesting in some faulty programs using different software complexity metrics, the average 
level of nesting bandwidth (BW), studied by Jensen and Vairavan [9], the cyclomatic 
number (v(G)), proposed by McCabe [10], the nested complexity (NC) and the number of 
Statements (STM). 

In Chapter II, we briefly describe some measures of software complexity that have 
been proposed, in order to provide a base of understanding for the following discussion. 
In Chapter II we present the description of the environment and metrics used to test our 
hypothesis, and the resulting data obtained from our analysis. Chapter IV details our 
interpretation of the results and what can be done to improve the quality of software during 
the development process using software metrics. Finally, in Chapter V, we provide our 
conclusions concerning the possible directions of future work in this area. The Appendix 
contains the tables with the metrics and faults, correlation coefficients and analysis of 


variance obtained for each version. 








Il. SURVEY OF SOFTWARE COMPLEXITY MEASURES 


In this survey we are only concerned with complexity metrics that can be used for 





testing and maintenance purposes. 
Many methods to measure software complexity have been proposed and explored. 


Software complexity metrics may be classified into two basic types, static and dynamic as 


shown in Figure 2. 
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Figure 2. Classification of Software Complexity Measures 





Static measures are obtained by static analysis of the source code or the high level 
description of the code. Dynamic measures consist of evaluation data collected at run time 
and may change from one execution to another. Dynamic measures may be CPU execution 
time, main storage used, data base size or computer turnaround time. Static measures may 
in turn be divided into four types, control organization metrics, data organization metrics, 
volume metrics and composite metrics. 

The following sections overview some of the static measures that have been described 


in the software complexity research. 


A. CONTROL ORGANIZATION METRICS 

The control organization metrics are measures of the comprehensibility of control 
structures. These metrics use the structure of the source code to quantify the degree of 
complexity of the programs. Most of them use the structure of the algorithm represented 
by a directed graph called the control-flow graph. For each structured program module it 
is possible to get a directed graph with a unique entry node and a unique exit node. Each 
node in the graph corresponds to a block of code in the program where the flow is 
sequential and the arcs correspond to branches taken in the program. 

1. Cyclomatic complexity 

The metric originally proposed by McCabe [10] uses mathematical concepts from 

graph theory applied to control-flow graphs. The cyclomatic complexity w(G) of a 
control-flow graph G with 1 vertices, e edges and p connected components is: 


WG) =e-n+ 2p 


The cyclomatic number in a strongly connected graph, a graph where each node 
is reachable from every other node, is equal to the maximum number of linearly 
independent paths. In a module, this has been shown to be equal to the count of the 
number of decision statements in the module plus one. Thus, the cyclomatic complexity 
of a control-flow graph gives us the minimum number of paths that we should test to 
achieve independent path coverage. It has been proved that the cyclomatic complexity for 
a program with several modules is just the sum of the cyclomatic complexities of the 
individual modules. 

The cyclomatic complexity metric seems to have some relation with the number 
of software errors and the debugging effort [2]. McCabe claimed that an upper bound for 
cyclomatic complexity equal to 10 seems to be a reasonable, but not magical upper limit 
for software modules. The intention is to keep the size of the modules manageable and 
allow for testing all the independent paths. 

Another metric that uses the same concept of counting the number of changes 
in the flow of control is the count of decisions DE. Usually the sequential flow of control 
may be interrupted in three different ways: forward branches (IF-THEN-ELSE or CASE 
statements), backward branches (WHILE or REPEAT statements), and horizontal branches 
(procedure calls). An easy way of measuring the number of decisions is to count the 
predicates that affect the control flow. For instance an IF statement with two conditions is 
going to contribute two to the count of decisions. The same rule applies to the CASE 


statement that can be considered an IF statement with multiple predicates. 


Gilb proposed two other metrics, C,, the absolute logical complexity which is 
the number of binary decisions, and the relative logical complexity, c,, which is the ratio 
of C,, to the number of executable statements [11]. These metrics have been supported by 
some empirical evidence and the latter may be considered as an improvement over pure 
control metrics as it also takes into account some size metrics. 

Another control metric, NPATH, has been recently proposed by Nejmeh [12]. 
NPATH is a count of the number of acyclic execution paths through a function. It has been 
used with functions written in C at AT&T Bell Laboratories. The author claims that this 
metric overcomes the following shortcomings of v(G). First, the number of acyclic paths 
in a flow graph varies pee a linear to an exponential function of v(G). Thus, the number 
of acyclic execution paths that may not be tested by a methodology based on that metric 
varies from 0 to 2", where n is the number of vertices in the flow graph. Second, the 
problem of treating different control structures (IF, WHILE, FOR) in the same way. 


Finally, the fact that v(G) does not consider the level of nesting. 


2. Nesting Level 
Structure complexity can be determined by the depth of nesting [13], the average 
nesting level [14], and the bandwidth [9]. The basic assumption is that the higher the 
nesting level, the more difficult it is to determine the right data values to exercise those 
parts of the code. The nesting level of the first executable statement is assigned the value 
of one. If the following statement is sequential then it is assigned the same nesting level, 
otherwise it is assigned a nesting level of two. In general if the first statement is at level 


! and the following statement is in the range of a loop or a conditional transfer of control 
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then the nesting level of that statement is / + J. The average nesting level NL is equal to 
the sum of all statement nesting levels divided by the number of statements. The bandwidth 
BW is equal to the sum of (i * L(i)) divided by the total number of nodes in the control 


graph, where L(i) is the number of nodes at level i. 


3. Transfer of Control 

The idea of measuring the use of GOTO statements in FORTRAN programs was 
proposed by Woodward, Hennel and Hedley [15]. This measure is called knots. Lmagine 
a forward arrow drawn on the left margin of a program listing indicating the flow of control 
from each GOTO statement to its respective LABEL and a backward arrow drawn from the 
end to the beginning of the respective loop. There is a knot every time we find an 
intersection of these arrows or control transfers. For equivalent programs the ones with the 
lower number of knots are believed to be better designed. Baker and Zweben [16] showed 
that in some cases this measure does not capture some control flow complexity differences. 
They present an example with two linearizations of a program that are equivalent and have 
different knots count measures. Another problem is that the addition of alternative 
constructs affects this measure in programs written in FORTRAN. For languages with an 
IF-THEN-ELSE operator the inclusion of an alternative construct does not affect the metric. 
Programs with arbitrary amounts of structured transfer of control may have the same 
complexity as any straight line code. This is an unappealing property of a general measure 
of control flow complexity. 

Another pair of measures based on the flow graph of the program was proposed 


by Harrison and Magel in [17] called SCOPE and SCORT. A node is a sequential block 


it 
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of code with a unique entrance and exit but no intemal branch or loop. An edge is the flow 
of control between the various nodes. The outdegree of node wu is the number of edges 
emanating from uv, and the indegree of node u is the number of edges incident at u. Nodes 


with outdegree 0 or | are RECEIVING nodes and those with outdegree greater than | are 





SELECTION nodes. Given a selection node, we can find at least one lower bound node 
which succeeds every immediate successor of the selection node. The lower bound node 
that precedes every other lower bound is the greatest lower bound GLB. The number of 
nodes preceding GLB and succeeding the selection node, plus one, yields the adjusted 
complexity AC of that selection node. The SCOPE metric is calculated by summing up the 
adjusted complexity of each node. SCORT is the scope ratio metric and is defined as: 
SCORT = ( 1.0 - N/SCOPE ) * 100% 

where N is the number of nodes in the flow graph excluding the terminal node. SCORT 
increases towards 100% as complexity increases. 

The SCOPE metric is dependent on the number of nodes in the flow graph. 
Therefore this measure cannot always be reliable, since some programs can be rearranged 


to give flow graphs with different scope measures [7]. 


4. Minimum Number of Paths 
The minimum number of paths in a program, N,, and the reachability of a node, 
R, were metrics proposed by Schneidewind and Hoffman [18]. The determination of N, is 
done using path analysis to find a set of unique sequences of arcs from the start node to the 
terminal node excluding paths with backward loops traversed more than once. Since every 


decision leads to at least one extra path, it is always true that N, >= wG). Reachability of 
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a node is defined as the number of unique ways of reaching that node. These metrics may 
be hard to determine on large programs because the number of paths can be very large or 


even infinite when loops exist. 


5. Evaluation of Control Organization Metrics 

Cyclomatic complexity and all the metrics that use the number of decisions are 
based on the assumption that software faults are proportional to control-flow complexity. 
This assumption seems to be well supported for v(G). There have been lots of empirical 
studies, since that metric was proposed in 1976, that show some relation between the higher 
values of this metric and the modules that are more error-prone [36], [37]. 

The value of v(G), however, may lead us to incorrect conclusions about the 
characteristics of the software product. A program may use several data structures very 
hard to implement and manipulate, and lots of modules calling other modules recursively 
and have a low value of v(G). Intuitively this program should be complex and hard to test, 
and consequently more error-prone. Thus, this metric is not a good predictor of error- 
proneness of the modules in every case. The cyclomatic complexity is an easy to use and 
useful rule of thumb for comparing alternate approaches and for estimating the amount of 
debug labor between similar programs developed in the same environment. 

Jensen and Vairavan [9] have indicated that cyclomatic complexity correlates 
better then some of the measures based on the nesting level (e.g., bandwidth) to the number 


of program changes and problem reports. 
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The control organization metrics do not consider the contribution of any other 
factor except control flow complexity. These metrics, however, can differentiate between 


two programs of similar volume metrics and certainly are related to the software quality. 


B. DATA ORGANIZATION METRICS 

The data organization metrics are measures of data use and visibility, as well as the 
interactions between data within a program. These metrics are concerned with the amount 
of input data, output data and processed data internally used by software. The simplest data 
structure metric is the count of variables that are defined and used in a program. Another 
simple data structure metric is the count of the number of I/O formats in FORTRAN or 


COBOL code. 


1. The Usage of Data Within Each Module 

The usage of data within a module may be measured using the concepts of live 
variables and variable spans. The hypothesis is that the more data items a programmer 
must keep track of when constructing a statement, the more difficult it is to construct. A 
variable may be considered live from its first to its last references within a procedure. The 
average number of live variables is the sum of the count of live variables divided by the 
number of executable statements. The span is the number of statements between two 
successive references to the same variable. Thus, a large span indicates that the 
programmer during the construction process had to remember a variable that was last used 
far back in the program. These metrics have been used in a study by Elshoff reported in 


[19], using programs written in PL/1, to identify areas of greater complexity. Programmers 
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have indicated that this measure can also be applied to other languages, particularly 


COBOL, because the information presented is similar. 


2. The Usage of Data Between Modules 

Henry and Kafura [20] proposed a method to measure the complexity of code 
due to the flow of information from one module to another using an information flow 
metric. The flows of information into and out of a procedure are called fan-ins and fan- 
outs. Local flows represent the flow of information to or from a routine through the use 
of parameters and return values from function calls. Fan-in is the number of local flows 
into a procedure plus the number of global data structures from which a procedure retrieves 
information. Fan-out is the number of local flows from a procedure plus the number of 
global data structures which the sone aie updates. The complexity of a procedure p is 
defined as: 

C,, = ( fan-in * fan-out )? 

Another approach to the evaluation of complexity between modules is to measure 
the sharing of data as global variables among modules as suggested by Basily and 
Turner [21]. This may be done by counting the number of pairs (M, R) where M is a 
seginent or module and R is a global variable that is read or changed by M. These pairs 
are called the segment-global usage pairs. 

McClure proposed another metric focused on the complexity associated with the 
control structures and control variables used to direct procedure invocation in a program 
[22]. She claims that all predicates do not contribute the same complexity. The control 


variables appearing in conditional statements that determine the invocation of other 
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procedures contribute with a higher complexity. The complexity of a program module P 
consists of two factors: the complexity associated with the control variables invoking 
module P and the complexity associated with the control variables by which module P 
invokes other modules. The overall complexity is determined by the sum of the 


complexities of the modules. 


3. Evaluation of Data Organization Metrics 

These metrics are based on the assumption that software complexity is related 
with the amount of data processed and the flow of data through the program. This 
assumption may not be valid in some cases because there are other factors that contribute 
to increase the complexity of software. The structural complexity and the size are examples 
of those factors. There are some studies, however, that found some relation between these 
metrics and the number of faults. 

The information flow metric was used in an objective study of the UNIX™ 
operating system. This study found a statistical correlation of 0.95 between faults and 
procedure complexity as measured by the information flow metric [20]. 

All these data organization measures attempt to capture a different kind of 
complexity of the control organization metrics. The simplest is a count of the number of 
entries in the cross-reference list of the program (VARS). The metric VARS seems to be 
robust and even slight variations in algorithm computation schemes do not seem to affect 


other measures based upon it. 
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C. VOLUME METRICS 

The volume metrics are measures of the size of the product. There are many methods 
to measure software size. The easiest one is the count of the lines of code. This metric 
may include all the source lines or only the executable statements. Usually it includes the 
lines containing program headers, declarations, executable and non-executable statements, 
and excludes the lines containing comments. Other simple volume metrics are the number 


of statements or the number of operators and operands. 


1. Software Science Measures 
These measures were created by Halstead and they are part of a more complex 
family of metrics called Software Science [23]. We include these measures in the software 
size category although in his work there are several proposals of metrics to quantify other 
aspects of software. All the measures are functions of the count of the tokens that form the 
program. 
The basic measures are: 
n, = number of unique operators 
n, = number of unique operands 
N, = total occurrences of operators 
N, = total occurrences of operands 
Operators are symbols and keywords, and operands are variables, constants and 
labels. The length of a program, N, is expressed in tokens as: 


N=N,+N, 
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The Software Science measures defined other metrics related to size. The 

vocabulary, n, is: 
n=n, +n, 

The volunie, V, is: 

V=N * log, n 

There have been several studies that seem to show that these basic metrics are 
strongly correlated to program size and number of errors [9], [24]. 

Some other measures were proposed, the bug prediction formula B, and 
the programming effort E: 

B=(N,+N,) * log,(n, + n, )/ 3000 
E=(n,N,N log,n)/2n, 

These measures are more controversial. There are some studies that seem to 
confirm the bug prediction formula. They are reported by Lipow in [25] with a comparison 
of actual and predicted bug counts over a range of program sizes from 300 to 12,000 
executable statements. These results, however, are not conclusive. Conte, Dunsmore and 
Shen in [26] conclude that these measures, B and E, have not been shown to have good 
construct validity and they probably do not measure exactly what Halstead hoped they 


would. 


2. Unit Count 
The idea behind this approach is to divide the source code in programming 
modules or units. These modules may be defined in many different ways. A module may 


be a segment of code that can be compilated separately or a procedure that executes a 


18 





particular algorithm. Each module may be divided in one or more functions. A function 
is defined as a collection of declarations and executable statements that performs a certain 
task. It is not easy ‘a count the number of functions unless the programs are created with 
each module as a separate function. Studies have shown that different programmers tend 
to use a similar number of functions to solve the same probiem using a different number 
of modules [27]. Another related measure is the count of the number of statements per unit, 


the average length of a programming module. 


3. Length Estimators 
There have been several proposed metrics to estimate the length of the programs. 
Halstead in [23] defined a program length estimator N,, as: 
N, = 1, log, n, + nz log, ny 
Jensen and Vairavan [9] proposed an empirical expression to compute the length 
N; of a program, claiming that it was a more accurate estimate than Halstead’s N,: 
N; = log, (n,!) + log, (n,!) 
4. Evaluation of Volume Metrics 
The volume metrics were the first measures of software complexity to be used. 
They have the same limitations of the control organization metrics and the data organization 
metrics because they are based on the assumption that complexity is only related to size. 
The software reliability study by Thayer, Lipow and Nelson [5] showed error 
rates ranging from 0.04% to 7% when measured against lines of code, with the most reliable 
routine being one of the largest. This seems to confirm that there is no direct correlation 


between faults and lines of code. Flaherty showed in [28] that there is some statistical 
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correlation between lines of code and maintenance costs. Another study by Li and Cheung 
[7] showed that the Software Science length estimator N,, overestimates the actual length for 
small programs and underestimates N for large programs. Thus, it cannot be a reliable 
measure of complexity. 

The major limitation of volume metrics is that they can only be measured after 
the design has been carried out fully to the debugged code. By then it is usually too late 


and tuo expensive to take the necessary corrective action. 


D. COMPOSITE METRICS 

Since each metric is designed to capture a particular feature of a program it is 
impossible to determine the overali complexity of a system based exclusively on some 
features. This conclusion led to several attempts to incorporate different metrics. The 


composite metrics are an attempt to combine some aspects of the previous types. 


1. Ordered-Pair Metrics 

There have been several attempts to combine different metrics in ordered pairs. 

Myers [29] proposed the pair (CYC-mid, CYC-max) where CYC-max is 
equivalent to the cyclomatic number, and CYC-mid is equal to CYC-min plus the number 
equal to two less than the number of selections in a CASE statement (CYC-min is the count 
of all conditionals and loops including CASE statements). 

There are other measures proposed by Hansen [30] that consist on using an 
ordered pair where one coordinate is a variation of the cyclomatic number and the other 


coordinate is a software science measure. 
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Oviedo [31] proposed a complexity metric based on control flow complexity CF 


(the cardinality of a set based on the control flow graph) and data flow complexity DF 
(based on the count of variable definitions and references in each block). 
This measure was defined as: 
C = aCF + bDF 


where a and b are some predefined weights. 


2. Weighted Measures 
Baker and Zweben [16] pointed out the need of a measure which should combine 
some of the measuring capabilities of the software sciences and the complexity number. 
A synthesis of software science measures and the cyclomatic number was proposed by 
Ramamurthy and Melton in [32]. A weighted measure is built from a software science 
measure by allowing certain operators and operands to contribute extra values. The purpose 
is to assign weights so that the length and volume measures can detect complexity produced 
- by nonsequential control structures. If an operand or operator is part of a control structure 
the idea is to add a value equal to the nesting level (weight) of that control structure to the 
count of occurrences of that operand or operator. The authors claim that these measures can 
detect the different types of complexity detected by the cyclomatic number and the software 
science. 
3. Hybrid Metrics 
These metrics combine some aspects of data structure metrics and logic structure 


metrics. 
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The information flow metric of Henri and Kafura [20] may also be used as an 

hybrid metric. The complexity of procedure p, C,, is defined as: 
C, = C,, * ( fan-in * fan-out )? 

C;, is the internal complexity of procedure p determined by any code metric. 

Another hybrid metric was proposed by Basily and Hutchens [33]. The 
definition of a syntactic complexity family that could include volume and control metrics 
led to a new metric called syntactic complexity (SynC). A program is called a proper 
program if it has a single entry and a single exit, and every node of the program lies on 
some path from the entry to the exit. 

The measure SynC of a program p is defined as: 

SynC(p) = 1.1 * C(p,;) + 1 + log, (n+1) if p proper statement 

or SynC(p) = 1.1 * C(p,) + 2 * (1 + log, (n+1) ) if p not proper statement 

where C(p,) is the sum of all the syntactic complexities of each subcomponent p, of the 
program, n is the number of decisions in program p that are not part of any subcomponent 
p; Nesting is penalized by multiplying C(p,) by 1.1, counting each statement 10% more 
than it would have been counted for at the next outer level. Poorly structured code is 
penalized twice as much as well as well structured code. Thus, this metric includes 
consideration of nesting level, length (statement count) and structured programming 
practices. 

Li and Cheung propose another hybrid metric in [7]. This hybrid metric is called 


NEW_1! and integrates software science with the SCOPE measure. They define the raw 
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complexity of a node V; as E;: 

E, = N, log, j / L* 
where N,, n,, and L’ are the software science measures length, vocabulary and the estimate 
of the program level of the node V;, This last one can be calculated using the following 
expression: 

LA = 2/n, * nJN, 

The adjusted complexity for a selection node is the sum of the values of the raw 
complexity of every node within the SCOPE of that selection node, plus the value of the 
selection node itself. A receiving node has an adjusted complexity equal to its raw 
complexity. The complexity of the program is the sum of the adjusted complexities of 


every node. They define NEW_] as: 


NEW_1 = (1.0 - Total Raw Complexities/Total Adjusted Complexities) * 100% 


4. Evaluation of Composite Metrics 

Although composite metrics have the advantage of incorporating the strengths 
of the primitive types of metrics and provide a more accurate measure of software 
complexity, they tend to be harder to calculate. The interest and quality of the information 
supplied may not be sufficient to justify the cost and effort of using them. 

Most of the composite metrics have not been extensively tested as some of the 
other types because composite metrics are relatively new. The validation process must 
continue before these metrics can be effectively adopted in the characterization and 


evaluation of software. 
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E. THE ROLE OF SOFTWARE METRICS 

Software metrics are standard ways of measuring some attribute of the software 
development process. Some metrics have been used in industry while others have been 
confined to academic environments. Those more commonly used in industry are: lines of 
code (the simplest metric), cyclomatic complexity metric (proposed by McCabe in [7]) and 
their variations, and Software Science measures (proposed by Halstead in {23]). The use 
of these and other software complexity metrics in the industry and the armed forces is 
reported in several recent studies [38], [39], [40]. 

The great number of software measures that have been and continue to be proposed 
is a good indication that the controversy that has surrounded them since their first 
appearance is far from ended. Some claim that metrics are useless and expensive exercises 
in pointless data collection, while others argue that they are valuable management and 
engineering tools. 

The value of software measures, their limitations, their strengths, and the benefits they 
can provide, has to be verified through empirical studies in different kinds of environments. 
We cannot apply metrics without first understanding what we want to measure and how we 
will measure what we want to know about. Another issue when applying metrics is how 
to get the metrics results in a way that they can be used and understanded by the people in 
charge of the process. 

This study analyzes the use of some control organization metrics during the software 
development process, specially the testing phase. Some of the specific questions that this 


thesis addresses are: How do these metrics relate to the number of faults detected? How do 


24 











these metrics relate to the types of faults detected? What is the relation between different 
control organization metrics? How can these metrics be used to predict whether a given 
module is error-prone? 

As we collect more data about the relation between complexity metrics and potential 
software problems, we may be able to understand better the real importance and usefulness 


of metrics in issues of software reliability and cost. 
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Ill. DATA ANALYSIS 


A. METRICS INVESTIGATED 

Our hypothesis is that one of the major reasons why errors are not identified by 
programmers is that they are in parts of the code that are logically difficult to reach. In this 
study we decided to use three different measures of complexity to verify our hypothesis: the 
cyclomatic complexity v(G), the average nesting level BW and the number of statements 
STM. We also use another measure obtained from the product of the cyclomatic complexity 
and the bandwidth that we call nested complexity (NC). This is an attempt to find a metric 
sensitive to the level of nesting within the various control structures. 

BW and v(G) were chosen because intuitively they seem to capture the structural 
complexity fairly well. The cyclomatic complexity metric, as a count of the number of 
decisions in each module plus one, is related to the number of changes of control-flow. 
However, it cannot detect any complexity due to nested structures. The bandwidth is a 
measure of the average nesting level, therefore seems logical to try a combination with v(G) 
to get a more accurate measure of total software complexity. That combination is the 
measure NC. The metric number of statements STM was used because it is a volume 
metric similar to lines of code, the most used measure of software complexity. In this 
study, using Pascal programs, STM is the count of the tokens ";” and "BEGIN". 

The cyclomatic complexity and the number of statements were calculated for each 


module using a lexical scanner adapted to count the tokens according to the set of counting 
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rules for the Pascal language used by the Purdue University Software Metrics Research 


Group [27]. The nesting level of each module was analyzed by inspection. 


B. DESCRIPTION OF THE ENVIRONMENT 

A set of eight programs written from a single specification for a combat simulation 
problem was used in this study. The programs were designed and written in Pascal by two- 
person teams and the teams were assigned randomly from students in an upper division 
computer science course. The length of the programs varies from 1186 to 2489 lines of 
code and the number of modules of each program varies from 28 to 76 modules. 

A previous study [34] extensively tested these programs. The number of faults 
detected and a brief description of their types has been previously recorded. Five different 
fault detection techniques have been used to detect these faults: code reading by stepwise 
abstraction, multi-version voting, run-time assertions inserted by the programmers, functional 
testing with follow-on structural testing, and static data-reference analysis. A total of 209 

| faults were detected in that experiment, with 157 faults identified with specific pieces of 
code. The remainder mainly dealt with missing code and faults with distributed causes. 

The fault classification scheme used in that study was a fault taxonomy with 13 
classes designed specifically to reflect the variations in faults between the techniques. The 


fault classification scheme is described in Table 1, drawn from [35]. 


C. RELATION OF METRICS WITH NUMBER OF FAULTS 
The average values of the metrics and the numer of faults found for each program 


are shown in Table 1 of the Appendix. 
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Our results seem to confirm that there is some relation between software complexity 


due to the structure of the program and the number of errors. The modules with greater 


TABLE 1 - FAULT TAXONOMY 


CLASSES OF FAULTS 


[EXAMPLES OF FAULTS | TECHNIQUE USED | 
Aster, Reed Tes Vow] 

2 - Loop Condition Infinite loops | Vote, Assert, Test | 

3 - Calculation [Incorrect formulas | Read 
4 Initialization | Variables not initialized | Statcal Analysis, Test _| 
[5 - Substitution [wrong variables wed | Vote, Assert | 
| 6 - Missing Check ; Read | 
| 7 - Branch Condition Vote, Read, Test | 
| 8- Missing Branch | Localized missing code | Read, Test | 


: 9 - Missing Thread Missing path throughout Vote, Test 

program | 
| 10 - Unimplemented Missing functionality on all Test ; 
; Requirement paths 

: 11 - Ordering Operations in wrong order | vote, Test 


7 12 - Parameter Reversal | Actual parameters permuted Vote, Assert : 
with formal parameters 


13 - Data Structure Linked list becomes circular | Vote, Test, Read, Assert | 
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{ - Overrestriction Rejecting legal inputs 



























complexity using any of the three control structure metrics have more detected faults. These 
results seem to confirm other studies by Walsh [36]. The bandwidth and the nested 
complexity seem to have also some relation with the number of faults. However, the 


percentage of faults detected with these methods is not greater than the percentage obtained 
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using v(G). This is a useful observation because the computation of v(G) is easier than the 
computation of BW and NC. 

We observed the following averages using the set of eight versions: 18% of the total 
number of modules had v(G) greater than 10, and these modules contained 51% of the total 
number of faults; the modules with BW greater than 2.5 were 24% of the total number of 
modules, and these had 47% of the faults; the modules with NC greater than 29 were 17% 
of the total, and these contained 47% of the faults. These values are a good indication that 
we nay be able to detect the modules with more tendency to have errors using complexity 
metrics, specially these particular control structure metrics. 

The modules having STM greater that 24 comprised 28% of the total number of 
modules and these contained 52% of the faults detected. The small percentage of modules 
is misleading because these modules have in average 65% of the total number of statements 
in each version. The metric STM do not seem to have any relation with the number of 
faults. This result is a confirmation of other studies [5] that did not find any relation 
between lines of code and software faults. 

Our preliminary results seemed to indicate that the modules where the metric NC was 
less than 4, contained also a greater number of faults. Our first hypothesis was that this 
could be a consequence of the carelessness of programmers only because the modules 
seemed obvious and easy to implement. This assumption, however, was not validated by 
our data because the large number of faults actually found in those moduies was a direct 


consequence of having many modules with small values of NC in this set of programs. 
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The faults in the modules with less complexity seem to have a regular distribution: 
16% of the modules have NC equal to 1 and 10% of the total number of faults; 43% of the 
modules have NC less or equal to 4 and 23% of the faults; 56% of the modules have NC 
less or equal to 8 and 35% of the total number of faults. These results seem to show that 
the number of errors in the modules with less complexity increases proportionally at the 
same rate that complexity when the value of NC is less than 30. In the modules where this 
metric is greater than 30 the number of errors increases at an higher rate. 

The complete analysis of variance of faults using the four different metrics with each 
version are shown in Tal.cs 49-22 of the Appendix. The between groups variance is the 
estimate of variance based on the differences between the means of sets of modules with 
the same value of the metric. This estimate reflects the internal differences in the number 
of faults detected between sets of modules separated according to the values of the metrics. 
The estimnate of variance based only on the differences between individual modules is called 
the within groups variance. This estimate reflects only the chance variations involved in 
drawing a sample. The degree of freedom of the variation between groups is the number 
of groups or sets of modules with the same value of the metric minus one. The degree of 
freedom of the variation within groups is equal to the total number of modules minus the 
total number of groups of modules. The F-ratio is the quotient of the two variances. The 
F-ratio is used to determine if the difference between groups in a sample is significant or 
not. This can be done using tables of the F-distribution and the values of the two degrees 


of freedom. The mean square is the ratio between the sum of squares and the respective 


degree of freedom. 





The analysis of variance of the number of faults using all the metrics presented very 
low significance levels, which is an indication that the probability that our results were 
obtained by chance is very low. The only exceptions were found using NC or v(G) in 
version 8, and this may be a consequence of having only two modules with high complexity 
containing only one fault in this program. Our results indicate also that the variations in 
the number of faults between the different sets of modules according to the values of the 
metrics are significant, because in general all the versions have the variation between groups 
greater than the variation within groups. This is another indication of the good fault- 


predictive ability of software control-structure metrics. 


D. RELATION OF METRICS WITH TYPES OF FAULTS 

We used the values of NC to divide the modules in two sets: those with NC less or 
equal to 4 and those with NC greater or equal to 30. The modules with NC between 5 and 
29 were not considered. Then, we identified the faults found in the two sets of modules and 
their respective types according to the fault classification previously described. 

We found some similarities and some differences between the types of faults detected 
in the two sets of modules. About 43% of the total number of faults in both sets belonged 
to classes 3 and 6. Class 3 faults are calculation faults, for instance the use of the wrong 
expression in the calculation, and class 6 faults are due to missing code to deal with illegal 
behavior, for example divide by zero faults. The first type of faults may occur because of 
misunderstanding of the specifications during the translation to code, and obviously does 


not depend on the complexity of the structure. The second type may be the result of the 
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carelessness of programmers because of time constraints, so frequent during the 
development of any software system. 

The significant differences between the two sets of programs were found in the classes 
1 and 5, faults due to overrestrictive input checks and wrong variable uses, respectively. 
The modules with less complexity had a low incidence in these type of faults (6%) while 
the modules with more complexity had a high incidence (19%). These faults may be caused 
by different reasons. We have two hypothesis to explain the observation. Programmers 
tend to clutter the source code with unnecessary conditions when it is already complex from 
the beginning. This may happen because they do not understand exactly what the program 
should do in those areas, leading to class | faults. The reason for class 2 errors may be 
related to the difficulty of keeping track of the variables and their use in the modules with 
a large number of nested control structures. 

The remaining classes of favits had no significant clusters to allow some conclusions 
about their relation to structural complexity. Their distribution was quite similar in both 


sets. 


E. RELATION BETWEEN METRICS 

In order to understand the relationship between the various software metrics used, 
Pearson correlation coefficients were computed for every pair of metrics, indicating the 
degree of linear relationship between them. Pearson values lie in the interval [0,1]. The 
correlation coefficients for each program are shown in Tables 2-9 in the Appendix. 

We observed that the correlation between v(G) and BW is not very high and its value 


depends on the program. This observation does not confirm the earlier results of [6]. This 
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seems to be intuitively correct because the two metrics are measuring different aspects of 


software complexity. 

The cinpeniey measure NC seems to correlate well with v(G) and BW. This 
measure seems to bridge the gap between the two previous metrics and conceptually is a 
more refined measure of the complexity of the control structure. 

Another observation is that the STM metric does not correlate well with any of the 
other metrics. This result is different from other studies [9] that presented the cyclomatic 
complexity correlating well with lines of code, another volume metric. Our values for the 
correlation between v(G) and STM are similar to the results reported in a more recent study 
by Henry and Selig [36] using Pascal source code (0.65 against 0.63). 

The Tables 10-17 in the Appendix show the values of the metrics and the number of 


faults found for all modules in each version. 
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sV. DATA INTERPRETATION 


A. DATA LIMITATIONS 

It is important to consider several limitations when drawing conclusions from the data 
presented in this study. First, this study used several versions of only one application 
written in one language, Pascal, and this may not be representative of a large number of 
applications. Second, data gathered from programs designed and written by students should 
be used with caution. Lastly, the number of faults in each module may be misleading 
because the versions may have more faults than those that were detected. However, the 
final versions are relatively large and have been produced from a specification derived from 
an industrial specification. They have been extensively tested and the testing methods used 


provide a relatively good coverage. 


B. USING METRICS IN SOFTWARE DEVELOPMENT 

In spite of the limitations that unfortunately are common in this type of 
experimentation, information can be derived from this study about the software development 
process. 

Our data indicates that the modules with higher values of software complexity using 
the three different control organization metrics v(G), BW or NC, have more detected faults. 
This is a good indication that these metrics can be used to predict the modules with more 
tendency to have faults. This may be useful particularly if they are used at the design stage 


providing feedback to the software developers, allowing the redesign of those modules. The 
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fact that these metrics can be computed from the high level description of the algorithuns 
in the form of control-flow graphs or even pseudo-code is another reason why they must 
be used in the earlier stages of the development process to reduce the impact of the changes 
and consequently their cost. 

The coding phase should start only after the detailed design phase has pruned out the 
most troublesome areas according to the value of the software complexity metrics. If it is 
not possible to eliminate those areas at the design stage, the project managers must be 
alerted to inherent levels of complexity in the source code and take appropriate actions 
during the reviewing and testing phases. Given a limited budget, a large project cannot 
afford complete branch coverage or inspection coverage. It is most effective to simplify 
unnecessarily complex modules and spend more time inspecting, reviewing and testing those 
modules that are inherently more complex. 


Another observation concerns the types of faults detected in the modules according 


_ to their measured complexity. The most common types of faults detected in all the modules 


independently of the value of the complexity metrics were calculation faults and faults due 
to missing checks for obvious illegal behavior. This result seems to be an indication that 
these types of faults are not related to structural complexity and have to be handled in a 
previous stage of the development process, the requirements specification phase. 

We verified also that the modules with more complexity had a relatively high 
incidence of faults due to overrestrictive input checks and wrong uses of variables, when 
compared to the modules with less complexity. This result seems to show that structural 


complexity at the source code level is related to these particular types of faults. 


35 











Our data suggest that software metrics can be used to divide the modules according 
to their structural complexity before the testing phase starts. The testing techniques used 
for each set of modules may be chosen according to the types of faults occurring more 
frequently in each set. This would allow a more efficient use of the several testing 
techniques because some of them are more suited to find particular types of faults. 

There is no widely accepted detailed taxonomy for fault classification. Other 
classification schemes that may be used in similar studies were proposed by Beizer [2], 
Rubey [42] and Endres [43]. This raises the issue of having a standard to define the 
different types of software faults, even if it is evident that there is no universally correct 
way to categorize faults. That standard taxonomy could be only a starting point. This 
would allow a unified framework to all the research dealing with software faults and 
software reliability. 

The study of the relation between software metrics, fault detection techniques and the 
different types of faults has to continue. The testing tools available now have to be used 
in the most effective way because we cannot afford the cost of testing very large and 
complex programs using brute-force approaches. 

The maintenance phase may also get some benefits from the use of metrics. Most of 
the software being developed now results from changes in existing products instead of new 
products started from scratch. The modifications done to the programs usually consist of 
adding new functionalities, resulting in higher complexity of the modules at the source code 
level. Complexity metrics may be used to monitor changes to existing software to keep the 


modules in a manageable and testable form. 
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The creation of automated tools measuring software complexity at each stage of 
development to flag potential problems to the project managers may reduce costs and 
increase the reliability of software. These automated tools must present the metrics results 
in a way that all the personnel involved in the process can understand them without 
difficulty instead of providing just pages and pages of numbers, formulas and tables that 
nobody wants to look at. There are already tools that present a graphical representation of 
the cyclomatic complexity of the programs. This approach is giving better results than 
before using only the numerical values because programmers and managers respond more 
readily to the visual image [44]. 

Another useful observation of this study is that a great number of faults detected, 
classes 3 and 6, were found just by reading the code. These results confirm the 
observations of Beizer in [2] that desk checking and particularly code reading are the best 


catchers of private bugs and cannot be completely replaced by any other technique. 


C. TESTING ANOTHER VERSION 

The existence of another version of the same program where the faults had not been 
identified during the experiment reported in [34] gave us an opportunity to test some of the 
results obtained with the other versions. 

Our initial approach was the determination of the values of the metrics for each 
module to establish different sets of modules according to their structural complexity. Then, 
we tried to detect the maximum number of faults just by reading the code. Using this 
technique we found 16 faults caused by missing code to deal with divide by zero situations. 


The faults were scattered throughout the program without any special incidence in the 
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modules with greater complexity. The modules with greater complexity had only 3 faults. 
These observations seem to confirm the results obtained with the other versions that 
indicated that this type of faults does not have any relation to structural complexity. 

In our effort to detect more faults we ran the program with 100 randomly-generated 
test cases. Using this technique we verified that 15 cases gave us results that indicated the 
existence of faults in some modules. Analyzing for the detected faults, we found a missing 
branch in one routine, two faults involving variable initialization and use in another routine, 
an unused function, a loop scoping fault and two calculation faults. 

The use of randoin tests to detect the existence of faults and the nested complexity 
metric to detect the modules that may contain those faults was extremely useful during this 
testing phase. We found more incidence in faults caused by overrestrictive checks and 
missing branches in the modules with greater complexity as Observation, Restoration and 
OutputReport. This seems to confirm our previous observation that at least the first type 
of faults is more frequent in the modules with higher values of nested complexity. 

The values of the metrics and the faults detected for each module in this version are 
presented in Table 18 of the Appendix. Due to the incompleteness of the data on this 
version, statistical analysis was not performed on the relationship between detected faults 


and the metrics. Nevertheless the results support the use of this metric in realistic testing. 
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V. CONCLUSIONS 


A. FUTURE RESEARCH 

The use of metrics in software development is gaining an increasing interest in recent 
years as research shows their usefulness. However, there have been many proposals of new 
metrics, some of them complex and difficult to use and others trying to measure subjective 
aspects of software that cannot be measured at all. We are running the risk of spending 
more money implementing the metrics program to control the development process than 
building the software systems themselves. This may be one of the reasons why software 
metrics have raised so much controversy and skepticism among the software developers and 
researchers. As we stated at the beginning of this work, a good metric must be simple to 
calculate and understand by the software developers, otherwise its usefulness is completely 
overwhelmed by the overhead of using it. Researchers should continue to test the existing 
metrics with real data, with different kinds of programs and systems to verify their 
applicability. Project managers and programmers in industry and in the armed forces should 
start controlling their software development processes using different types of metrics 
instead of only the traditional Software Science measures and the cyclomatic complexity. 
More data must be collected incorporating programs of different types like operating 
systems, compilers and embedded real-time systems to verify the usefulness of metrics. 

The application of other fault detection techniques to version 9 trying to test the 


results and hypothesis generated from the other versions may provide some answers to the 
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following questions: Are the faults due to overrestrictive input checks and wrong variable 
uses more frequent in the modules with greater complexity? Is the majority of faults in 
programs caused by wrong expressions and missing checks? We need to know if our 
observations are a good indication of some pattern or they are only a consequence of this 
particular environment. 

We can never be sure that a verification is correct, thus, we need to apply the testing 
techniques in the most effective way to gain a reasoned and cautious assurance that the 
programs will run satisfactorily. To achieve this goal we need to have a better knowledge 
of the strengths and limitations of each testing technique. Are they particularly suited to 
find some types of bugs? What are those types? We need more empirical studies to test and 
compare the testing techniques in different environments to provide some answers to these 
questions. Can we develop new testing tools to help us to find obvious illegal situations? 
Can we build more powerful data flow analyzers to follow the use of the variables through 
all the program? The automation of the testing process is another area of research that needs 
to be addressed by the computer-science community. 

The impact of using formal methods during the requirements specification phase is 
another interesting area of research that can find answers to some of the questions raised 
by our work. Can we reduce only some types of faults using that approach or can we 
reduce all types of faults? Is the structural complexity of the programs reduced if we use 
those methods or is it increased? Is it possible that when we are reducing the number of 
faults caused by incorrect specifications we are also increasing the number of faults caused 


by structural complexity? If we add more checks and more conditions to the code based on 
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more correct specifications to assure that nothing in the requirements is left out, our 
assumption is that the program is going to be more complex. Therefore, it will be more 


difficult to test and debug. 


B. FINAL COMMENTS 

This study raised new questions but also provided some answers to the questions and 
hypothesis presented in the introduction. 

This empirical study shows that control organization metrics can be used to predict 
the more error-prone modules. Our data seem to indicate that the number of nested control 
structures used in the programs has some relation with some types of faults. Namely the 
faults caused by overrestrictive input checks and wrong use of variables. This observation 
seems to confirm our hypothesis that some faults are not detected because they are in parts 
of the code that are difficult to reach during the path sensitizing process. This information 
may be useful during the testing phase because software developers know in advance that 
the data flow in the modules with higher levels of nesting needs to be checked. The 
modules with less nested complexity show a regular distribution of faults, most of them 
caused by wrong expressiors and missing checks that cannot be related with structural 
complexity. These bugs seem to be caused by faulty specifications and have to be 
eliminated in the requirements specification phase through the use of formal specification 
techniques. 

The software developers should use not only formal methods to specify the 
requirements but also software metrics to control how those requirements are implemented. 


Even if we use automated tools to build the systems based on formal specification 
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languages, the human intervention caused by the interaction between the developers and the 
customers during the definition of the requirements is going to cause faults. Another 
problem that may arise is that usually code created by automated tools is highly optimized 
and consequently very complex. Our data indicate that an increase in structural complexity 
may create other types of faults and the detection of this increase can be done using control 
structure metrics. Software complexity metrics can be used to identify the improper 
integration of functional enhancements made to the systems. The analysis of the redesigned 
versions of the systems using metrics can reveal poorly structured components. This can 
be particularly useful to monitor maintenance activities, one of the most critical phases of 


the software development cycle in terms of costs. 
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APPENDIX - TABLES OF METRICS, CORRELATION COEFFICIENTS AND 


ANALYSIS OF VARIANCE OF FAULTS WITH METRICS 


TABLE 1 - AVERAGES OF METRICS 
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TABLE 2 - CORRELATION COEFFICIENTS FOR VERSION | 





TABLE 3 - CORRELATION COEFFICIENTS FOR VERSION 2 








TABLE 4 - CORRELATION COEFFICIENTS FOR VERSION 3 





TABLE 5 - CORRELATION COEFFICIENTS FOR VERSION 4 
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TABLE 6 - CORRELATION COEFFICIENTS FOR VERSION 5 





TABLE 7 - CORRELATION COEFFICIENTS FOR VERSION 6 
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TABLE 8 - CORRELATION COEFFICIENTS FOR VERSION 7 





TABLE 9 - CORRELATION COEFFICIENTS FOR PROGRAM 8 
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TABLE 10 - METRICS FOR VERSION | 


NAME OF MODULE 


Ceiling 

Minl 

MinR 

MaxIR 
SizeListLoc 
OutsideRange 
Scream 
SquadAlive 
BatAlive 

Verifyir ut 
CheckParams 
CheckArnmy Values 
CheckComMsg 
Check Weather 
BatVelocV 
AltitudeZ 

DistD 
TerrMoveTM 
WeatherSevFactWF 
WeatherObservWC 
WeatherMoveWM 
Position 

HeightH 
FindAngle 
FirstCondition 
SecondCondition 
Slopelntensity!S 
IntensityLoclL 
LocationIntensityBI 
VisualContrast 
ObservJam 
ThirdCondition 
SendReports 
Observation 
Movement 
PrepareOutput 
Initialization 
Restoration 
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TABLE 10 - METRICS FOR VERSION I 


NAME OF MODULE 


TotalRestoredCasualtiesFF 
Coeflicient 
NumSquadsRestoringNF 
RestoreSuppAmtFS 
RestoreFactorF 

Attrition 
SetFiredUponCoords 
AssignLLCoords 
KiiledK 

CalcEndurE 
NumKillersNK 
KillersAvailKA 
TimesKillersUsedKU 
TotalWeapInUseNW 
Communication 
TotalSquadsSendingNS 
TotalSquadsReceiveNR 
TotalSquadsJammingNJ 
TotalSquadsProcessingNP 
PutintoList 

SendMsgs 
ProcessCommandMessages 
ProcessReportMessages 
MsgReceiptDelayRD 
ManipProcessList 
PutMsgOnSentLL 
ManipMsgQueue 

Update 
InstantiateCommandMsg 
ClearDeadSquads 
ForEachSquad 
ForEachWeap 
UpdateArmy Values 
Conflict 
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TABLE 11 - METRICS FOR VERSION 2 


NAME OF MODULE ~_—iv(G) 


Conflict 
Attritinit 
MinReal 
MinInt 
Max 
MaxiInt 
Roof 

Floor 

Dist 

Alt 

TMove 
PosIntens 
WTotal 
WMove 
WObs 
ScaleSquad 
Positioning 
Velocity 
XMove 
YMove 
Movement 
CalcContrast 
Observation 
CanJSeek 


AngleBigEnough 


Slope 

FindPt 
NoObstacles 
Height 
Ojamming 
NoObsJammed 
Attrition 
Attritlnflict 
Weapons 
FireCoord 
Suffering 
Restoration 
Communication 
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TABLE 11 - METRICS FOR VERSION 2 


NAME OF MODULE 


UpdateComm 
AddToQ 
CreateRejiits 
CreateCommands 
PullFromQ 
RellayMessages 
ConsumeReports 
ConsumeCommands 
Simulation 
Initialization 
Posinit 

Movelnit 

ObslInit 
Comminit 
MoveOut 

Output 

AttritOut 
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TABLE 12 - METRICS FOR VERSION 3 


NAME OF MODULE 


max 
min 

Distance 

Cieling 

findA 

Altitude 

BI 

™ 

WF 

WM 

wo 

Change 

SubAngle 

InitRec 

Output 

DataUpdate 

ScanQueue 

PutInQueue 

BatPosition 

© 9llowCommandMessages 
positioning 
ReceiveMessages 
Sighting 
CompareRecDMessages 
Observe 
SendObservations 
SendOrders 
SendMessages 

Update 

DoDamage 
WeaponSighting 
Summation 

Attrition 

Jam 

Move 

Restore 
PerformPassiveFunction 
Aggression 
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TABLE 12 - METRICS FOR VERSION 3 


NAME OF MODULE v(G) BW NC STM FAULTS 
DoAction 1 1 1 5 - 
Init Vals Il 3.36 37 53 3 
Conflict 3 1.33 4 64 2 
ScanCQueue 7 1.57 I! 14 - 
NM 11 3.09 34 14 1 
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NAME OF MODULE 


Conflict 
IsDestroyed 
IsCasualty 
Altitude 
Distance 
WSeverity 
WEObservation 
WEMovement 
Calc Velocity 
MTerrain 
Slopelntensity 
Altintensity 
Loclintensity 


IntensityOfLocation 


Ceiling 
PositionSquads 
Init Visual 
Transfer 
BattalionSize 
PrepOutput 
Attrition 
SetCoordinates 
Inflict 
WeaponCount 
WeapUsage 
AvailableWeapons 
Weaponinflict 
Suffer 
Endurance 
SquadDamage 
InitFireList 
Communications 
ProcessMsg 
SendMsg 
QueueMsg 
ReceiveMsg 
AddToList 
CmdReplace 
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TABLE 13 - METRICS FOR VERSION 4 
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TABLE 13 - METRICS FOR VERSION 4 


NAME OF MODULE v(G) BW NC STM FAULTS 
ReportMessages 9 1.89 17 27 - 
CommandMessages 7 1.43 10 23 - 
Movement 8 2.88 23 28 - 
Observation 15 5.53 83 30 6 
ValidObservation 4 1.75 7 12 1 
Check Height 2 1 2 9 - 
ObsJamming 7 2.71 19 18 1 
Angle 18 2.61 47 45 - 
Line 8 2.13 17 20 2 
ObsContrast 3 1 3 5 - 
LocationList 3 1.33 4 18 1 
VisualContrast 8 3.38 PH | 19 - 
NewCasualties 3 1.33 4 12 - 
Total ‘asualties 3 1.33 4 11 - 
RestoreAmount 6 2.17 13 12 - 
RestoreSupplies 4 1.25 P 13 - 
SquadFixers 1 I I 7 - 
Restoration 9 3.22 29 26 - 
Initialize 24 1.83 44 63 - 
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TABLE 14 - METRICS FOR VERSION 5 





NAME OF MODULE — v(G) BW NC STM FAULTS 
Conflict 4 1 4 65 - 
Distance 3 1 3 4 - 
Altitude 1 1 1 13 3 
CompWeath 5 9 45 18 - 
Position 16 1.69 27 39 - 
Simulate 9 2.33 21 22 - 
ChangeOld 1 1 1 10 - 
ChangeSquad 1 1 1 8 - 
Attrition 5 1.40 7 6 - 
Suffer 11 4.82 53 36 - 
Inflict 27 3.52 95 94 2 
Comnicat 40 3.35 134 157 5 
StoreMess 5 2.20 11 23 - 
ReprtMess 6 1.17 7 21 - 
ComndMess 5 2.20 11 23 - 
UpdateCommVars 1 1 1 12 - 
Movement 1h 2.09 23 29 - 
WeffMov 2 1 2 7 - 
TeffMov 3 1 3 13 - 
Observation 49 6.98 342 151 - 
SpacePoints 7 Z 14 36 2 
IntnstyLoc 1 1 1 13 2 
WeffObs 2 1 2 6 - 
Restoration 14 2.57 36 32 1 
Wear 10 3.6 36 21 - 
Validate 89 2.73 243 70 4 
SetInitial Values 8 2.50 20 38 3 
OutputResults 11 3.91 43 37 1 
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TABLE 15 - METRICS FOR VERSION 6 


NAME OF MODULE v(G) BW 


NC STM FAULTS 
Conflict 1 1 1 59 - 
Min 2 1 2 4 - 
Max 2 t 2 4 - 
IMin 2 1 2 4 - 
IMax 2 1 2 4 - 
Ceiling 3 1.33 4 6 - 
Distance | 1 1 7 - 
Height 5 1.40 7 13 - 
UpdateBattalion Velocity 3 1.33 4 13 - 
CheckBattConstants 24 1.17 28 24 - 
AlignSquads 13 2 26 31 - 
InitBattalion 23 1.96 45 43 3 
Initialize 17 1.29 22 28 - 
CreateLosList 2 1 2 10 - 
PerformSimulation 3 1.33 4 20 1 
Update Weather 1 1 1 6 - 
UpdatePresentEvents 4 1.75 7 23 - 
AddNewE vents 3 1.33 4 14 - 
PerformOneDt 1 1 1 9 - 
WeatherSeverity 6 1.67 10 17 - 
Movement 4 1.75 7 5 - 
MoveBattalion 1 1 1 15 - 
TEOnMovement 2 1 2 11 - 
WEOnMovement 2 1 2 10 - 
Observation 4 1.75 7 10 - 
GenObsList 9 3.22 29 25 - 
Observable 3 1.33 4 11 1 
AngleSubGreater 16 3.69 59 42 - 
UpdateLOS List 2 1 2 14 1 
LOSClear 3 1 3 20 1 
CntrstOK 3 1 3 21 - 
LocationIintensity 1 1 1 16 - 
ObsJamming 3 1.33 4 iz - 
WEOnObservation 2 1 2 10 - 
IncludeCommObs 7 2.57 18 27 1 
CollectFinishedReport 13 3.85 50 13 - 
UpdateLL 8 3.88 31 26 - 
SumObsToNextBatt 12 3.25 39 27 - 
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TABLE 15 - METRICS FOR VERSION 6 


NAME OF MODULE 


Attrition 

Num Weapons 

Track Weapons 
Update UseList 
ChooseTargets 
SufferAttition 
Restoration 
NewNumFixers 
ApportionFixing 
RemoveDestroyedSquads 
Communication 
SendCommunications 
SendReport 
NewNumSend 
SendCommand 
ReceiveCommunications 
FindReceivingDelay 
ReceiveReports 
ReceiveCommands 
UpdateNum Vars 
ProcessCommunications 
HandleQueuing 
QueueReports 
FindQueueSpot 
QueueCommands 
FindQueue 
ProcessingDelay 
ProcessMessages 
FindNextReport 
FindNextCommand 
TakeACommand 
TakeAReport 
NewNumProcessing 
PrepareForNextDT 
CollectCommands 
CollectCommand 
PutinCommand 
DetermineOutput 
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TABLE 16 - METRICS FOR VERSION 7 


NAME OF MODULE v(G) BW NC STM FAULTS 
Conflict 1 1 1 23 1 
float 1 1 1 4 - 
WriteError 11 1 1! 19 - 
CheckMessages 6 2 12 16 - 
Check Weather 6 1.67 10 10 - 
CheckParams 9 1 9 18 - 
Process 11 2.91 32 55 - 
InvalidPosition 5 1 5 7 - 
CheckBatallionInfo 1 1 1 9 - 
CheckNAmy 4 1.50 6 12 1 
CheckPerBatallion 30 1 30 40 3 
CheckPerSquad 6 1.67 10 13 - 
CheckPerEnemy 5 1.80 9 9 - 
CheckPerWeapon 10 2.50 25 16 - 
GetTs 1 1 1 8 2 
Altitude 1 1 1 16 - 
Distance 1 1 1 7 - 
WFactor 5 1.80 9 \7 - 
WXPosition 1 1 1 4 - 
WYPosition 1 1 1 4 - 
Height 1 1 1 9 - 
Makelnt 2 1 2 4 - 
Initialize 13 2.46 32 34 1 
Velocity 4 1.75 7 14 - 
SetSquad 3 1.33 4 27 - 
SetPosition 11 2 22 39 - 
InitializeWeapData 3 1.33 4 11 1 
PositionSquadrons 4 1.75 r 16 - 
SetPosition 11 2 22 37 - 
Observation 15 3.67 55 27 3 
VisibleSquad 5 1.20 6 23 1 
SubAngle 9 2.56 23 ii - 
GetAngle 29 1 29 23 - 
Series 2 1 2 16 l 
Clear View 5 1.40 7 16 - 
OContrast 4 1.25 5 22 - 
Intensity 1 I 1 16 - 
OJamming 3 1.33 4 {1 - 
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TABLE 16 - METRICS FOR VERSION 7 


NAME OF MODULE 


WObserve 
CommandMess 
RecDelay 
JammedSquads 
Incorporate 
InitializeWeData 
Attrition 
InflictAttrition 
SetFire 
CalcNumObserv 
CalcNumWeapToUse 
Min 
CalculateDarges 
InRange 
Updatelnfo 
UpdateBattalion 
DeltaFixSupp! 
UpdatePosition 
SetRestoration 
ChangeSquadData 
SetDamage 
Movement 
WMovement 
TerrEffect 
SetOutput 
GetDifference 
Greatest 

Least 

GetStatus 
Distance 
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TABLE 17 - METRICS FOR VERSION 8 


NAME OF MODULE 


Conflict 
UpdateU 
SquadPos 
LinearDistance 
Altitude 
Velocity 
WSevFactor 
SetKU 

Init Variables 
CalcBl 
VisContrast 
Movement 
TerrainEffect 
WeatherMoveEffect 
Observation 
FindAngle 
SumObJam 
WObsEffect 
Observable 
Height 

Attrition 
NumOfWeapons 
SetAttacked 
LengthOfList 
ResetObserveLists 
Restoration 
Update Vars 
UpdateFS 
UpdateFF 
UpdateK 
CalcCas 
CalcBK 
UpdateNums 
UpdateE 
UpdateKA 
UpdateKU 
ClearAttackLists 
PrepareOutput 
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TABLE 17 - METRICS FOR VERSION 8 


NAME OF MODULE v(G) BW NC STM FAULTS 
SetLocation 10 2.90 29 42 1 
SetStatus 3 1.33 4 15 - 
Ceiling Z 1 zZ 6 - 
InsertMsg 8 2.25 18 28 - 
CommandMsg 3 1.33 4 ik - 
InsertCom 1 1 1 18 - 
ReportMsg 2 1 2 7 - 
InsertRep 3 1 3 27 - 
Communication 1 I 1 7 - 
ReceiveDelay 6 1.33 8 22 - 
CalRDelay 4 1.50 6 27 - 
QueDelay 6 1.50 9 20 - 
PutQueue 9 2.44 22 25 4 
ProcessQue 5 2 10 20 - 
ProcessMsg 9 pe 3 20 28 - 
MergeRepMsg 3 1.33 4 9 - 
MergeComMsg 4 1 4 34 1 
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TABLE [8 - METRICS FOR VERSION 9 


NAME OF MODULE v(G) BW NC STM FAULTS 


Conflict 2 1 2 52 - 
D 1 1 1 8 - 
Ceiling 3 1.33 4 4 - 
WF 6 2.17 13 18 - 
WMorWO 8 2.13 17 14 - 
SquadsPos 10 Z 20 40 1 
CalcVg 5 2 10 13 7 
Initialize il 3.36 37 33 - 
CheckBattalions 33 15.17 527 15 - 
CheckData 9 1.56 14 11 2 
Z 1 1 1 12 Z 
™ 2 1 2 8 2 
BI 1 1 I 11 2 
Movement 5 2.20 11 23 - 
Observation 16 4.56 73 39 2 
Restoration 15 4.40 66 45 1 
GetAngleComerPts 7 1.57 II 23 - 
Overlap 23 3.08 7h 38 - 
CheckAngle 3 1 3 Il 2 
CheckZ 4 1.25 5 14 1 
GetOJ 4 1.50 6 ll - 
Check Contrasts 3 1 3 8 - 
CalcFgj 6 2 12 12 - 
UpdateFFg 5 1.33 4 11 - 
UpdateNFg 2 1 2 8 - 
UpdateFSg 3 1 3 tl - 
GetLocation 2 1 2 11 - 
GetLenLL Z 1 2 10 - 
Attrition 1 1 1 8 - 
Inflict 9 4 36 25 - 
GetNW 8 2.75 22 13 1 
Suffer 4 1.75 7 6 - 
WearTear 6 2.67 16 10 - 
AddLL 4 1.75 7 15 - 
Send 10 2.70 27 29 1 
HasObs 2 1 2 8 - 
InsertQ 5 2 10 \7 - 
Receive 16 3.94 63 83 1 
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TABLE 18 - METRICS FOR VERSION 9 


NAME OF MODULE v(G) BW NC STM 
Process 16 3.69 59 57 
ChangeBatEnv 1 4 23 
Communicate | 1 1 6 
OutputReport 18 2.78 50 35 


FAULTS 

















TABLE 19 - ANALYSIS OF VARIANCE OF FAULTS WITH v(G) 


Version 

Number 

Source of Degrees of | Mean F-Ratio | Significance 
Variation Freedom Square Level 
Between 

Groups 

Within 


1 16.8234] 0.93463 | 3.340 0.0003 
14.82936 0.27979 
2 11.18333 0.69896 | 4.729 
5.61667 0.14781 
- a 2.156 0.0399 


2.17966 | 6.668 
0.32689 


2.82797 | 5.690 0.0029 
0.49697 
12.14627 0.71448 = | 3.575 0.0001 
11.59058 0.19984 
1.60862 | 7.779 
0.20678 
0.66965 1.590 
| 0. 42104 — 
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TABLE 20 - ANALYSIS OF VARIANCE OF FAULTS WITH BW 










t Source of 
Variation 










Within 


Sum of Degrees of | Mean F-Ratio | Significance | 

Squares | Freedom Square Level 

| 

Groups 


21.64520 0.72151 2.956 
10.00758 | 41 0.24408 

2 11.4933 0.57467 | 3.682 
5.30667 0.15608 









30 
20 
| 34 
i 3 29.95560 | 17 1.76209 | 1.688 
26.09091 | 25 1.04363 | 
4 51.37127 | 32 1.60535 | 18.025 
2.13750 | 24 0.08906 
5 44.31428 | 18 2.46190 | 3.462 | 0.0309 : 
6.40000 | 9 0.71111 
14.60223 | 28 0.52151 | 2.683 : 
9.134615 | 47 0.19435 
| 
7 19.03387 | 20 0.95169 | 2.822 
15.84849 | 47 0.33720 ) 
17.69777 | 22 0.80444 | 3.086 
8.86364 | 34 0.26069 
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TABLE 21 - ANALYSIS OF VARIANCE OF FAULTS WITH NC 








| 
Degrees of Significance | 
Freedom Level 
| 
| 
| 


22.04722 0.68898 | 2.797 | 0.0012 
9.60556 0.24629 
15.17592 0.65982 | 12.502 | 
1.58333 0.05278 ! 
48.98545 2.32835 | 6.797 
7.53636 0.34256 : 
48.64211 1.67731 v.oo00 | 
4.86667 0.18025 : 
| 
5 45.41429 2.39023 |3.608 | 0.0343 | 
0.66250 
14.16322 ri 0.50583 2.483 0.0029 
9.57362 0.20369 
26.14231 1.24487 | 6.873 
0.18114 


7.60769 


9.30585 0.48978 1.050 0.4345 
17. 2556 12. 46636 









Version 
Nutinber 
Sum of 


1 Source of 
Squares 


H Variation 


| Between 
| Groups 
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TABLE 22 - ANALYSIS OF VARIANCE OF FAULTS WITH STM 









Sum of Degrees of | Mean Significance 
Squares | Freedom Square Level 










Version 
’ Number 


| Source of 
Variation 











1 Between 
1 Groups 






Within 
y Groups 





18.06944 | 34 0.53145 | 1.448 0.1361 
13.58333 | 37 0.36712 
5.13333 | 24 0.21389 | 0.550 0.9316 ! 
11.66667 | 30 0.38889 | 





13 41.62985 | 21 1.98237 | 2.888 | 0.0094 
14.41667 | 21 0.68651 

| 45.17544 | 32 1.41173 

7 8.33333 | 24 0.34722 

5 46.04762 | 22 2.09307 | 2.243 | 0.1884 

: 4.66667 | 5 0.93333 

i ¢ 14.78880 | 32 0.46213 | 2.221 | 0.0075 

| 8.94881 | 43 0.20812 

[7 19.03387 | 20 0.95169 | 2.822 | 0.0018 
15.84849 | 47 0.33720 

| 20.22807 | 26 0.77800 | 3.685 
6.33333 | 30 0.21111 
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